1 1 3 M ar 2 00 0 TnT — A Statistical Part - of - Speech Tagger
نویسنده
چکیده
Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of TnT, the techniques used for smoothing and for handling unknown words. Furthermore, we present evaluations on two corpora.
منابع مشابه
TnT -- A Statistical Part-of-Speech Tagger
Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of...
متن کاملUnigram Backoff vs. TnT Evaluating Part of Speech Taggers Introduction to Computational Linguistics
Automated statistical part-of-speech (POS) tagging has been a very active research area for many years and is the foundation of natural language processing systems. This paper introduces and analyzes the performance of two part-of-speech taggers, namely the NLTK unigram backoff tagger and the TnT tagger, a trigram tagger. Experimental results show that the TnT tagger outperforms the NLTK unigra...
متن کاملThe Open Source Tagger HunPoS for Swedish
HunPoS, a freely available open source part-of-speech tagger—a reimplementation of one of the best performing taggers, TnT—is applied to Swedish and evaluated when the tagger is trained on various sizes of training data. The tagger’s accuracy is compared to other data-driven taggers for Swedish. The results show that the tagging performance of HunPoS is as accurate as TnT and can be used effici...
متن کاملTnt Tagger for Malayalam with Fuzzy Rule Based Learning
TnT is an efficient statistical Parts-of-speech (POS) Tagger based on Hidden Markov Model. TnT performs well on known word sequences. But, the performance degrades with increase in the number of unknown words. In this paper, we propose a method to overcome this performance degradation using fuzzy rules. Fuzzy rule based model is designed to provide TnT with sufficient information about the tag ...
متن کاملTagging the Dutch PAROLE Corpus
We discuss the annotation with part of speech and lemma of the Dutch PAROLE Internet Corpus. The PAROLE PoS tagger is a combination of statistical taggers. It includes the Markov tagger TnT and 3 taggers developed at the INL with the purpose of using other information besides the training data. Lemma is assigned by a deterministic procedure, based on an extensive lexicon. The output is in some ...
متن کامل